Machine Learning Big Data Framework and Analytics for Big Data Problems

نویسندگان

  • Shafaatunnur Hasan
  • Siti Mariyam Shamsuddin
  • Noel Lopes
چکیده

Generally, big data computing deals with massive and high dimensional data such as DNA microrray data, financial data, medical imagery, satellite imagery and hyperspectral imagery. Therefore, big data computing needs advanced technologies or methods to solve the issues of computational time to extract valuable information without information loss. In this context, generally, Machine Learning (ML) algorithms have been considered to learn and find useful and valuable information from large value of data. However, ML algorithms such as Neural Networks are computationally expensive, and typically the central processing unit (CPU) is unable to cope with these requirements. Thus, we need high performance computer to execute faster solutions such Graphical Processing Unit (GPU). GPUs provide remarkable performance gains compared to CPUs. The GPU is relatively inexpensive with affordable price, availability and scalability. Since 2006, NVIDIA provides simplification of the GPU programming model with the Compute Unified Device Architecture (CUDA), which supports for accessible programming interfaces and industry-standard languages, such as C and C++. Since then, General Purpose Graphical Processing Unit (GPGPU) using ML algorithms are applied on various applications; including signal and image pattern classification in biomedical area. The importance of fast analysis of detecting cancer or non-cancer becomes the motivation of this study. Accordingly, we proposed machine learning framework and analytics of Self Organizing Map (SOM) and Multiple Back Propagation (MBP) for big biomedical data classification problems. Big data such as gene expression datasets are executed on high performance computer and Fermi architecture graphical hardware. Based on the experiment, MBP and SOM with GPU Tesla generates faster computing times than high performance computer with feasible results in terms of speed performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

P-V-L Deep: A Big Data Analytics Solution for Now-casting in Monetary Policy

The development of new technologies has confronted the entire domain of science and industry with issues of big data's scalability as well as its integration with the purpose of forecasting analytics in its life cycle. In predictive analytics, the forecast of near-future and recent past - or in other words, the now-casting - is the continuous study of real-time events and constantly updated whe...

متن کامل

Big Data Analytics and Now-casting: A Comprehensive Model for Eventuality of Forecasting and Predictive Policies of Policy-making Institutions

The ability of now-casting and eventuality is the most crucial and vital achievement of big data analytics in the area of policy-making. To recognize the trends and to render a real image of the current condition and alarming immediate indicators, the significance and the specific positions of big data in policy-making are undeniable. Moreover, the requirement for policy-making institutions to ...

متن کامل

A Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection

Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....

متن کامل

Handling Big Data Stream Analytics using SAMOA Framework - A Practical Experience

Data analytics and machine learning has always been of great importance in almost every field especially in business decision making and strategy building, in healthcare domain, in text mining and pattern identification on the web, in meteorological department, etc. The daily exponential growth of data today has shifted the normal data analytics to new paradigm of Big Data Analytics and Big Dat...

متن کامل

Application of Big Data Analytics in Power Distribution Network

Smart grid enhances optimization in generation, distribution and consumption of the electricity by integrating information and communication technologies into the grid. Today, utilities are moving towards smart grid applications, most common one being deployment of smart meters in advanced metering infrastructure, and the first technical challenge they face is the huge volume of data generated ...

متن کامل

Big Data Analytics in Bioinformatics: A Machine Learning Perspective

Bioinformatics research is characterized by voluminous and incremental datasets and complex data analytics methods. The machine learning methods used in bioinformatics are iterative and parallel. These methods can be scaled to handle big data using the distributed and parallel computing technologies. Usually big data tools perform computation in batch-mode and are not optimized for iterative pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014